{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# SciPy - Library of scientific algorithms for Python"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Parts of this notebook have been taken from:\n",
    "\n",
    "[http://github.com/jrjohansson/scientific-python-lectures](http://github.com/jrjohansson/scientific-python-lectures).\n",
    "\n",
    "The other notebooks in this lecture series are indexed at [http://jrjohansson.github.io](http://jrjohansson.github.io)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Introduction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "center",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "The SciPy framework builds on top of the low-level NumPy framework for multidimensional arrays, and provides a large number of higher-level scientific algorithms. Some of the topics that SciPy covers are:\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* Special functions ([scipy.special](http://docs.scipy.org/doc/scipy/reference/special.html))\n",
    "* Integration ([scipy.integrate](http://docs.scipy.org/doc/scipy/reference/integrate.html))\n",
    "* Optimization ([scipy.optimize](http://docs.scipy.org/doc/scipy/reference/optimize.html))\n",
    "* Interpolation ([scipy.interpolate](http://docs.scipy.org/doc/scipy/reference/interpolate.html))\n",
    "* Fourier Transforms ([scipy.fftpack](http://docs.scipy.org/doc/scipy/reference/fftpack.html))\n",
    "* Signal Processing ([scipy.signal](http://docs.scipy.org/doc/scipy/reference/signal.html))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "\n",
    "* Linear Algebra ([scipy.linalg](http://docs.scipy.org/doc/scipy/reference/linalg.html))\n",
    "* Sparse Eigenvalue Problems ([scipy.sparse](http://docs.scipy.org/doc/scipy/reference/sparse.html))\n",
    "* Statistics ([scipy.stats](http://docs.scipy.org/doc/scipy/reference/stats.html))\n",
    "* Multi-dimensional image processing ([scipy.ndimage](http://docs.scipy.org/doc/scipy/reference/ndimage.html))\n",
    "* File IO ([scipy.io](http://docs.scipy.org/doc/scipy/reference/io.html))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "\n",
    "Each of these submodules provides a number of functions and classes that can be used to solve problems in their respective topics.\n",
    "\n",
    "In this lecture we will look at how to use some of these subpackages."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "To access the SciPy package in a Python program, we start by importing everything from the `scipy` module."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "from scipy import *"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "If we only need to use part of the SciPy framework we can selectively include only those modules we are interested in. For example, to include the linear algebra package under the name `la`, we can do:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "import scipy.linalg as la"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Sparse matrices"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Sparse matrices are often useful in numerical simulations dealing with large systems, if the problem can be described in matrix form where the matrices or vectors mostly contains zeros. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "`SciPy` has a good support for sparse matrices, with basic linear algebra operations (such as equation solving, eigenvalue calculations, etc)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "There are many possible strategies for storing sparse matrices in an efficient way. Some of the most common are the so-called **coordinate form (COO)**, **list of list (LIL) form**, and **compressed-sparse column CSC (and row, CSR)**. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "*Advantages*\n",
    "- CSR or CSC \n",
    " - Most computations can be efficiently implemented\n",
    "- COO or LIL format \n",
    " - Easy initialization\n",
    " - Adding elements is efficient"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "*Disadvantages*\n",
    "- CSR or CSC \n",
    " - They're not intuitive\n",
    " - Not easy to initialize\n",
    "- COO or LIL format \n",
    " - Operations and computations are not efficient."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "For more information about sparse matrices, see e.g. [Wikipedia](http://en.wikipedia.org/wiki/Sparse_matrix)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Good ol' examples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Let's inspect the following matrix:\n",
    "\n",
    "|   |   |   |   |\n",
    "|---|---|---|---|\n",
    "| 1 | 0 | 0 | 0 |\n",
    "| 0 | 3 | 0 | 0 |\n",
    "| 0 | 1 | 1 | 0 |\n",
    "| 1 | 0 | 0 | 1 |\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "from scipy.sparse import *"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1, 0, 0, 0],\n",
       "       [0, 3, 0, 0],\n",
       "       [0, 1, 1, 0],\n",
       "       [1, 0, 0, 1]])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# dense matrix\n",
    "M = array([[1,0,0,0], [0,3,0,0], [0,1,1,0], [1,0,0,1]])\n",
    "M"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "When we create a sparse matrix we have to choose which format it should be stored in. For example, "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<4x4 sparse matrix of type '<class 'numpy.int64'>'\n",
       "\twith 6 stored elements in Compressed Sparse Row format>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# convert from dense to sparse\n",
    "A = csr_matrix(M)\n",
    "A"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "From sparse to dense"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "matrix([[1, 0, 0, 0],\n",
       "        [0, 3, 0, 0],\n",
       "        [0, 1, 1, 0],\n",
       "        [1, 0, 0, 1]], dtype=int64)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# convert from sparse to dense\n",
    "A.todense()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "More efficient way to create sparse matrices: create an empty matrix and populate with using matrix indexing (avoids creating a potentially large dense matrix)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<4x4 sparse matrix of type '<class 'numpy.float64'>'\n",
       "\twith 6 stored elements in LInked List format>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A = lil_matrix((4,4)) # empty 4x4 sparse matrix\n",
    "A[0,0] = 1\n",
    "A[1,1] = 3\n",
    "A[2,2] = A[2,1] = 1\n",
    "A[3,3] = A[3,0] = 1\n",
    "A"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "matrix([[1., 0., 0., 0.],\n",
       "        [0., 3., 0., 0.],\n",
       "        [0., 1., 1., 0.],\n",
       "        [1., 0., 0., 1.]])"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A.todense()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Operations\n",
    "\n",
    "We can compute with sparse matrices like with dense matrices:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "matrix([[1., 0., 0., 0.],\n",
       "        [0., 9., 0., 0.],\n",
       "        [0., 4., 1., 0.],\n",
       "        [2., 0., 0., 1.]])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(A * A).todense()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "matrix([[1., 0., 0., 0.],\n",
       "        [0., 9., 0., 0.],\n",
       "        [0., 4., 1., 0.],\n",
       "        [2., 0., 0., 1.]])"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A.dot(A).todense()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "matrix([[1., 0., 0., 0.],\n",
       "        [0., 9., 0., 0.],\n",
       "        [0., 1., 1., 0.],\n",
       "        [1., 0., 0., 1.]])"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A.multiply(A).todense()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## CSR, CSC and COO\n",
    "\n",
    "There are a few different formats in which your sparse matrix might be encoded\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Remember our original matrix"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1, 0, 0, 0],\n",
       "       [0, 3, 0, 0],\n",
       "       [0, 1, 1, 0],\n",
       "       [1, 0, 0, 1]])"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "M"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## COO - Coordinate format"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<4x4 sparse matrix of type '<class 'numpy.float64'>'\n",
       "\twith 6 stored elements in COOrdinate format>"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A_coo = coo_matrix(A)\n",
    "A_coo"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "We have access to the inner data structure"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 1, 2, 2, 3, 3], dtype=int32)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A_coo.row"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 1, 1, 2, 0, 3], dtype=int32)"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A_coo.col"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1., 3., 1., 1., 1., 1.])"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A_coo.data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "The problem of this data structure is that we cannot efficiently index elements. How can we get the elements in row 3? We would have to loop the row field until we find the first 3, which is in position 5."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "\n",
    "Too slow for matrices of even few hundred rows."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "What if we had a pointer to the exact starting position of every row?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## CSR - Compressed Sparse Row format"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "matrix([[1., 0., 0., 0.],\n",
       "        [0., 3., 0., 0.],\n",
       "        [0., 1., 1., 0.],\n",
       "        [1., 0., 0., 1.]])"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A_csr = csr_matrix(A)\n",
    "A_csr.todense()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 1, 1, 2, 0, 3], dtype=int32)"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A_csr.indices"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 1, 2, 4, 6], dtype=int32)"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A_csr.indptr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1., 3., 1., 1., 1., 1.])"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A_csr.data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "In this case:\n",
    "\n",
    "- `indices`: Refers to the column index, same as `.col` in COO format\n",
    "- `data`: Refers to the value contained in the cell\n",
    "- `indptr`: It is a row-pointer. `indptr[i]` refers to the first cell in indices and data which contains elements of row `i`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Let's extract row 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "target_row = 3\n",
    "\n",
    "row_start = A_csr.indptr[target_row]\n",
    "row_end = A_csr.indptr[target_row+1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 3], dtype=int32)"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "row_columns = A_csr.indices[row_start:row_end]\n",
    "row_columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1., 1.])"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "row_data = A_csr.data[row_start:row_end]\n",
    "row_data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Double check with the original matrix"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1, 0, 0, 1])"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "M[target_row,:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## WARNING! \n",
    "\n",
    "CSR is made for fast **row** access, CSC for fast **column** access. Do not mix them"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Let's compare the speeds"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<10000x5000 sparse matrix of type '<class 'numpy.float64'>'\n",
       "\twith 2500000 stored elements in Compressed Sparse Row format>"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "big_matrix_csr = random(10000, 5000, density=0.05, format='csr')\n",
    "big_matrix_csr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<10000x5000 sparse matrix of type '<class 'numpy.float64'>'\n",
       "\twith 2500000 stored elements in Compressed Sparse Column format>"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "big_matrix_csc = big_matrix_csr.tocsc()\n",
    "big_matrix_csc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "610 ms ± 13.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "for row_index in range(big_matrix_csr.shape[0]):\n",
    "    # Do something\n",
    "    useless_row = big_matrix_csr[row_index]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "43.5 s ± 1.46 s per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "for row_index in range(big_matrix_csc.shape[0]):\n",
    "    # Do something\n",
    "    useless_row = big_matrix_csc[row_index]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Ouch! That is roughly a 68x difference\n",
    "\n",
    "It's very easy to waste a lot of time by using data structures in an incorrect way"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Another example for speed critical code. \n",
    "\n",
    "We want the item indices seen by a user. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "The simple implementation would be the following. Users are rows and the items are columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<100000x1500 sparse matrix of type '<class 'numpy.float64'>'\n",
       "\twith 1500000 stored elements in Compressed Sparse Row format>"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "URM = random(100000, 1500, density=0.01, format='csr')\n",
    "URM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "6.33 s ± 796 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "for user_id in range(URM.shape[0]):\n",
    "    # Do something\n",
    "    user_seen_items = URM[user_id].indices"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "62.1 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "for user_id in range(URM.shape[0]): \n",
    "    # Do something\n",
    "    user_seen_items = URM.indices[URM.indptr[user_id]:URM.indptr[user_id+1]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Even for this simple operation there is a difference of 2 orders of magnitude\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Why?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Let's see what the first version of the code does..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "#### Step 1 - slicing row"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<1x1500 sparse matrix of type '<class 'numpy.float64'>'\n",
       "\twith 15 stored elements in Compressed Sparse Row format>"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_id = 5\n",
    "user_row = URM[user_id]\n",
    "user_row"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "#### Step 2 - get column indices"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([  11,   23,   53,  405,  414,  576,  668,  702, 1168, 1214, 1216,\n",
       "       1236, 1304, 1366, 1472], dtype=int32)"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_seen_items = user_row.indices\n",
    "user_seen_items"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Now, let's go into the second version of the code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "#### Step 1 - slicing row"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(75, 90)"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_id = 5\n",
    "user_indices_start = URM.indptr[user_id]\n",
    "user_indices_end = URM.indptr[user_id + 1]\n",
    "\n",
    "user_indices_start, user_indices_end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "#### Step 2 - get column indices"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([  11,   23,   53,  405,  414,  576,  668,  702, 1168, 1214, 1216,\n",
       "       1236, 1304, 1366, 1472], dtype=int32)"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_seen_items = URM.indices[user_indices_start:user_indices_end]\n",
    "user_seen_items"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Did you see the difference?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "The reason for the big performance loss is that in the first way we are building a new sparse matrix for each user. We don't need the matrix itself, only one of its attributes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Further reading"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* http://www.scipy.org - The official web page for the SciPy project.\n",
    "* http://docs.scipy.org/doc/scipy/reference/tutorial/index.html - A tutorial on how to get started using SciPy. \n",
    "* https://github.com/scipy/scipy/ - The SciPy source code. "
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
